In this document we construct a dataset for all parks in Alameda County. We begin with a parks polygons layer and append amenity data collected from OpenStreetMap through the OverPass API.
We also compile a dataset of block groups in Alameda County with demographic attributes.
We obtained a polygons shapefile layer representing open spaces in Alameda County, California from the California Protected Areas Database [@cpad2019]. This dataset was selected because it included multiple different types of open space including local and state parks, traditional green spaces as well as wildlife refuges and other facilities that can be used for recreation. We removed facilities that did not allow open access to the public (such as the Oakland Zoo) and facilities whose boundaries conflated with freeway right-of-way –– this prevents trips through the park from being conflated with park trips in the passive origin-destination data.
not_parks <- c("3455", "15886", "13243")
parks <- st_read(here("data/bayarea_parks.geojson"), quiet = TRUE) %>%
filter(COUNTY == "Alameda") %>%
transmute(
id = as.character(UNIT_ID), name = UNIT_NAME,
access = factor(ACCESS_TYP, levels = c("Open Access", "No Public Access",
"Restricted Access")),
acres = ACRES, type = DES_TP) %>%
filter(!id %in% not_parks) %>%
filter(access == "Open Access") %>%
select(id, access, acres, type) %>%
st_transform(this_crs)
type_pal <- colorFactor("Dark2", parks$type)
leaflet(parks %>% st_transform(4326)) %>%
addProviderTiles(basetiles) %>%
addPolygons(color = ~type_pal(type))
The park amenities are drawn from OpenStreetMap data collected through the Overpass API. Each overpass query uses the same bounding box, which is a polygon of Alameda County.
bb <- getbb("Alameda County, California", format_out = "polygon")
Playgrounds are defined with the leisure = playground tag and key. See details on the OSM Wiki. Playgrounds can be either points or polygons in OpenStreetMap, though polygons are preferred. We assume that playgrounds are located at the centroid of their polygon or at the point.
playground_osm <- opq(bb) %>% # specify boundary for query
add_osm_feature(key = "leisure", value = "playground") %>% # specify which kinds of data we want
osmdata_sf() %>% # get a list of sf data frames for these tags
trim_osmdata(bb, exclude = TRUE)
playgrounds <- rbind(
# convert polygons to centroids
playground_osm$osm_polygons %>%
st_transform(this_crs) %>%
st_centroid() %>%
select(osm_id),
# get points
playground_osm$osm_points %>%
st_transform(this_crs) %>%
select(osm_id)
# no multipolygons or polylines that need to be included
) %>%
mutate(playground = TRUE)
## Warning in st_centroid.sf(.): st_centroid assumes attributes are constant over
## geometries of x
Below is a map of the playgrounds we could potentially include in the analysis. It is worth noting that this dataset could be incomplete. It also likely includes playgrounds that are affiliated with private gardens or schools, which will not be in the parks dataset. This is a limitation we will have to underline, but might not be able to address.
leaflet(playgrounds %>% st_transform(4326)) %>%
addProviderTiles(basetiles) %>%
addCircleMarkers(color = "blue", radius = 2)
Next up we have ball fields and pitches. These are defined with the leisure = "pitch" tag and key, see details on the OSM Wiki. Most of these elements are points, with some polygons that we convert to centroids. We eliminated pitches for golf as these are unlikely to drive urban park trips.
pitches_osm <- opq(bb) %>% # specify boundary for query
add_osm_feature(key = "leisure", value = "pitch") %>%
osmdata_sf() %>%
trim_osmdata(bb, exclude = TRUE)
pitches <- rbind(
# convert polygons to centroids
pitches_osm$osm_polygons %>%
st_transform(this_crs) %>%
st_centroid() %>%
select(osm_id, sport),
# get points
pitches_osm$osm_points %>%
st_transform(this_crs) %>%
select(osm_id, sport)
# no multipolygons or polylines that need to be included
) %>%
filter(sport != "golf") %>%
mutate(
sport = case_when(
grepl("football", sport) | grepl("soccer", sport) ~ "football / soccer",
grepl("baseball", sport) | grepl("softball", sport) ~ "baseball",
grepl("basketball", sport) ~ "basketball",
grepl("tennis", sport) ~ "tennis",
grepl("volleyball", sport) ~ "volleyball",
TRUE ~ "other"
)
)
## Warning in st_centroid.sf(.): st_centroid assumes attributes are constant over
## geometries of x
As before, we likely include in this query pitches that are part of private athletic clubs or schools. These will be dropped in the spatial join to follow.
pitch_sport <- colorFactor("Dark2", pitches$sport)
leaflet(pitches %>% st_transform(4326)) %>%
addProviderTiles(basetiles) %>%
addCircleMarkers(color = ~pitch_sport(sport), radius = 2) %>%
addLegend(pal = pitch_sport, values = ~ sport)
trails_osm <- opq(bb) %>% # specify boundary for query
add_osm_feature(key = "highway", value = c("footway", "cycleway", "path")) %>%
osmdata_sf() %>%
trim_osmdata(bb, exclude = FALSE)
trails <- rbind(
# points are nodes, so we can skip them.
# lines
trails_osm$osm_lines %>%
st_transform(this_crs) %>%
select(osm_id, bicycle, foot, horse),
# circular paths get converted to polygons, so we need to
# turn them back into lines.
trails_osm$osm_polygons %>%
st_transform(this_crs) %>%
st_boundary() %>%
select(osm_id, bicycle, foot, horse)
) %>%
mutate(length = st_length(.))
leaflet(trails %>% st_transform(4326)) %>%
addProviderTiles(basetiles) %>%
addPolylines()
We now need to establish whether the parks intersect geometrically with the features we have identified, thereby establishing whether the parks have the amenities we care about.
attributed_parks <- parks %>%
# determine if a playground point is inside the park polygon boundary
st_join(playgrounds %>% select(playground), st_intersects) %>%
# a park with multiple playgrounds will join multiple times, so we need to
# summarize them back down.
group_by(id) %>% slice(1) %>% ungroup() %>%
mutate(playground = ifelse(is.na(playground), F, playground)) %>%
# determine if the pitch points are inside the park polygon boundaries
st_join(pitches %>% select(sport)) %>%
# multiple pitches in a park result in repeated rows.
group_by(id) %>% slice(1) %>% ungroup() %>%
mutate(pitch = ifelse(is.na(sport), F, T)) %>%
select(-sport) %>%
# determine if the trails are inside the park polygon boundaries
st_join(trails %>% select(trail = osm_id), st_intersects) %>%
# multiple pitches in a park result in repeated rows.
group_by(id) %>% slice(1) %>% ungroup() %>%
mutate(trail = ifelse(is.na(trail), F, T))
The table below shows the distribution the attributes of the r nrow(attributed_parks) parks.
our_summary1 <- list(
"Acres" =
list("min" = ~ min(.data$acres),
"max" = ~ max(.data$acres),
"mean (sd)" = ~ qwraps2::mean_sd(.data$acres)),
"Attributes" =
list("Has Playground" = ~qwraps2::n_perc0(.data$playground),
"Has Pitch / Court" = ~qwraps2::n_perc0(.data$pitch),
"Has Paths" = ~qwraps2::n_perc0(.data$trail))
)
attributed_parks %>% st_set_geometry(NULL) %>%
qwraps2::summary_table(our_summary1) %>%
print(markup = "markdown")
| . (N = 500) | |
|---|---|
| Acres | |
| min | 0.038 |
| max | 4939.817 |
| mean (sd) | 67.57 \(\pm\) 388.79 |
| Attributes | |
| Has Playground | 225 (45) |
| Has Pitch / Court | 181 (36) |
| Has Paths | 321 (64) |
Now, we turn to the block groups in Alameda County. For each block group, we retrieve socioeconomic data from the 2014-2018 American Community Survey. The variables we obtain include:
B25008), housing units (B25001), and households (B19001)B19013)B19001)B02001) plus households of hispanic origin (B03001)With this raw data, we compute several additional variables that we will use in our analysis. We define the percent of households in each block group as the share of households under $35k / year, and high-income households as those over $125k. Household density is the number of households per square kilometer. Hispanic / Latino descent is only available at the tract level.
Need to find a better way to capture hispanic populations, but we are moving on for now.
variables <- c(
"population" = "B25008_001", # TOTAL POPULATION IN OCCUPIED HOUSING UNITS BY TENURE
"housing_units" = "B25001_001", # HOUSING UNITS
"households" = "B19001_001", #HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS)
#Estimate!!Total!!Female!!Worked in the past 12 months!!Usually worked 35 or more hours per week
# RACE
"black" = "B02001_003",
"asian" = "B02001_005",
"pacific" = "B02001_006",
"nativeam" = "B02001_004",
"other" = "B02001_007",
# HISPANIC OR LATINO ORIGIN BY SPECIFIC ORIGIN
# The number of hispanic individuals needs to be drawn from a different table.
# But this is only available at the tract level, where it appears to be roughly
# collinear with the "some other race alone"
"hispanic" = "B03001_003",
#MEDIAN HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS)
"income" = "B19013_001",
#HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 2017 INFLATION-ADJUSTED DOLLARS)
"inc_0010" = "B19001_002", "inc_1015" = "B19001_003", "inc_1520" = "B19001_004",
"inc_2025" = "B19001_005", "inc_2530" = "B19001_006", "inc_3035" = "B19001_007",
"inc_125" = "B19001_015", "inc_150" = "B19001_016", "inc_200" = "B19001_017"
)
acs_bg <- get_acs(geography = "block group", variables = variables, year = 2018,
state = "CA", county = "001", geometry = TRUE)
acs <- acs_bg %>%
select(-moe) %>%
spread(variable, estimate) %>%
mutate(
area = as.numeric(st_area(geometry) * 1e-6),
) %>%
select(-hispanic) %>%
# area is in m^2, change to km^2
transmute(
geoid = GEOID,
population, households, housing_units,
density = households / area,
income,
# many of the variables come in raw counts, but we want to consider
# them as shares of a relevant denominator.
lowincome = 100 * (inc_0010 + inc_1015 + inc_1520 + inc_2530 +
inc_3035) / households,
highincome = 100 * (inc_125 + inc_150 + inc_200) / households,
black = 100 * black / population,
asian = 100 * asian / population,
other = 100 * (nativeam + pacific + other) / population,
minority = black + other
)
tracts_data_index <- tibble(
variable = c("density",
"income", "lowincome", "highincome",
"black", "asian", "other")
) %>%
mutate(order = row_number())
acs %>%
st_set_geometry(NULL) %>%
filter(population > 0) %>%
tbl_df() %>%
dplyr::select(tracts_data_index$variable) %>%
gather(variable, value) %>%
group_by(variable) %>%
summarise(
`msd` = paste(median_iqr(value, na_rm = TRUE, show_n = "never", digits = 1))
) %>%
inner_join(tracts_data_index, by = "variable") %>%
arrange(order) %>%
mutate(
variable = c(
"Density: Households per square kilometer",
"Income: Median tract income",
"Low Income: Share of households making less than $35k",
"High Income: Share of households making more than $125k",
"Black: Share of population who is Black",
"Asian: Share of population who is Asian",
"Other: Share of population who belong to other minority groups")
) %>%
dplyr::select(-order) %>%
knitr::kable(col.names = c("", "Median (IQR)"),
digits = 1,
caption = "Descriptive Statistics for Residence Block Groups")
## Warning: `tbl_df()` is deprecated as of dplyr 1.0.0.
## Please use `tibble::as_tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## `summarise()` ungrouping output (override with `.groups` argument)
| Median (IQR) | |
|---|---|
| Density: Households per square kilometer | 1,335.5 (880.3, 2,183.1) |
| Income: Median tract income | 92,610.5 (64,110.0, 126,579.2) |
| Low Income: Share of households making less than $35k | 13.7 (7.3, 24.4) |
| High Income: Share of households making more than $125k | 33.9 (18.7, 50.8) |
| Black: Share of population who is Black | 6.8 (1.8, 17.8) |
| Asian: Share of population who is Asian | 21.2 (10.9, 38.8) |
| Other: Share of population who belong to other minority groups | 6.6 (2.4, 16.9) |
write_rds(attributed_parks, here("data/attributed_parks.rds"))
write_rds(acs, here("data/blockgroups.rds"))